30 research outputs found

    BiplotGUI: Interactive Biplots in R

    Get PDF
    Biplots simultaneously provide information on both the samples and the variables of a data matrix in two- or three-dimensional representations. The BiplotGUI package provides a graphical user interface for the construction of, interaction with, and manipulation of biplots in R. The samples are represented as points, with coordinates determined either by the choice of biplot, principal coordinate analysis or multidimensional scaling. Various transformations and dissimilarity metrics are available. Information on the original variables is incorporated by linear or non-linear calibrated axes. Goodness-of-fit measures are provided. Additional descriptors can be superimposed, including convex hulls, alpha-bags, point densities and classification regions. Amongst the interactive features are dynamic variable value prediction, zooming and point and axis drag-and-drop. Output can easily be exported to the R workspace for further manipulation. Three-dimensional biplots are incorporated via the rgl package. The user requires almost no knowledge of R syntax.

    BiplotGUI: Interactive Biplots in R

    Get PDF
    Biplots simultaneously provide information on both the samples and the variables ofa data matrix in two- or three-dimensional representations. The BiplotGUI package provides a graphical user interface for the construction of, interaction with, and manipulation of biplots in R. The samples are represented as points, with coordinates determined either by the choice of biplot, principal coordinate analysis or multidimensional scaling. Various transformations and dissimilarity metrics are available. Information on the original variables is incorporated by linear or non-linear calibrated axes. Goodness-of-t measures are provided. Additional descriptors can be superimposed, including convex hulls, alpha-bags, point densities and classication regions. Amongst the interactive features are dynamic variable value prediction, zooming and point and axis drag-and-drop. Output can easily be exported to the R workspace for further manipulation. Three-dimensional biplots are incorporated via the rgl package. The user requires almost no knowledge of R syntax

    Visualisation of quadratic discriminant analysis and its application in exploration of microbial interactions

    Get PDF
    Background: When comparing diseased and non-diseased patients in order to discriminate between the aspects associated with the specific disease, it is often observed that the diseased patients have more variability than the non-diseased patients. In such cases Quadratic discriminant analysis is required which is based on the estimation of different covariance structures for the different groups. Having different covariance matrices means the Canonical variate transformation cannot be used to obtain a visual representation of the discrimination and group separation. Results: In this paper an alternative method is proposed: combining the different transformations for the different groups into a single representation of the sample points with classification regions. In order to associate the differences in variables with group discrimination, a biplot is produced which include information on the variables, samples and their relationship

    Visualising Incomplete Data with Subset Multiple Correspondence Analysis

    Get PDF
    Determining the cause of missing values is a challenge, but an important task in order to select correct analysis techniques for missing data. This paper presents a new approach to identify the missing data mechanism (MDM) by applying cluster analysis to biplots of data having missing observations. Subset multiple correspondence analysis (sMCA) enables an isolated analysis of a chosen subset while preserving the scaffolding of the original data set. Multivariate categorical data sets are frequently represented in a coded dummy matrix, referred to as an indicator matrix. Additional category levels can be created for the indicator matrix to account for the unobserved information which has the advantage of not forfeiting any observed information. The extended indicator matrix easily partitions a data set into observed and unobserved subsets. sMCA biplots are used for the visual exploration of the subsets. Configurations of the incomplete subsets enable the recognition of non-response patterns which could aid in the identification of a particular MDM. The missing at random (MAR) MDM refers to missing responses that are dependent on the observed information and is expected to be identified by patterns and groupings occurring in the incomplete sMCA biplot. The missing completely at random (MCAR) MDMstates that all observations have the same probability of not being captured which could be identified by a random cloud of points in the incomplete sMCA biplot. The partitioning around mediods (pam) clustering technique is used to establish the number of available clusters in an incomplete sMCA biplot. A simulation study confirmed that there is a difference in the number of sufficient clusters that can by identified from MAR and MCAR simulated data sets. A real data set is also explored and the MDM is identified using the results of the simulation study as guidelines

    Succession and determinants of the early life nasopharyngeal microbiota in a South African birth cohort

    Get PDF
    Background: Bacteria colonizing the nasopharynx play a key role as gatekeepers of respiratory health. Yet, dynamics of early life nasopharyngeal (NP) bacterial profiles remain understudied in low- and middle-income countries (LMICs), where children have a high prevalence of risk factors for lower respiratory tract infection. We investigated longitudinal changes in NP bacterial profiles, and associated exposures, among healthy infants from low-income households in South Africa. Methods: We used short fragment (V4 region) 16S rRNA gene amplicon sequencing to characterize NP bacterial profiles from 103 infants in a South African birth cohort, at monthly intervals from birth through the first 12 months of life and six monthly thereafter until 30 months. Results: Corynebacterium and Staphylococcus were dominant colonizers at 1 month of life; however, these were rapidly replaced by Moraxella- or Haemophilus-dominated profiles by 4 months. This succession was almost universal and largely independent of a broad range of exposures. Warm weather (summer), lower gestational age, maternal smoking, no day-care attendance, antibiotic exposure, or low height-for-age z score at 12 months were associated with higher alpha and beta diversity. Summer was also associated with higher relative abundances of Staphylococcus, Streptococcus, Neisseria, or anaerobic gram-negative bacteria, whilst spring and winter were associated with higher relative abundances of Haemophilus or Corynebacterium, respectively. Maternal smoking was associated with higher relative abundances of Porphyromonas. Antibiotic therapy (or isoniazid prophylaxis for tuberculosis) was associated with higher relative abundance of anerobic taxa (Porphyromonas, Fusobacterium, and Prevotella) and with lower relative abundances of health associated-taxa Corynebacterium and Dolosigranulum. HIV-exposure was associated with higher relative abundances of Klebsiella or Veillonella and lower relative abundances of an unclassified genus within the family Lachnospiraceae. Conclusions: In this intensively sampled cohort, there was rapid and predictable replacement of early profiles dominated by health-associated Corynebacterium and Dolosigranulum with those dominated by Moraxella and Haemophilus, independent of exposures. Season and antibiotic exposure were key determinants of NP bacterial profiles. Understudied but highly prevalent exposures prevalent in LMICs, including maternal smoking and HIV-exposure, were associated with NP bacterial profiles

    Optimizing 16S rRNA gene profile analysis from low biomass nasopharyngeal and induced sputum specimens

    Get PDF
    Careful consideration of experimental artefacts is required in order to successfully apply high-throughput 16S ribosomal ribonucleic acid (rRNA) gene sequencing technology. Here we introduce experimental design, quality control and “denoising” approaches for sequencing low biomass specimens. Results We found that bacterial biomass is a key driver of 16S rRNA gene sequencing profiles generated from bacterial mock communities and that the use of different deoxyribonucleic acid (DNA) extraction methods [DSP Virus/Pathogen Mini Kit® (Kit-QS) and ZymoBIOMICS DNA Miniprep Kit (Kit-ZB)] and storage buffers [PrimeStore® Molecular Transport medium (Primestore) and Skim-milk, Tryptone, Glucose and Glycerol (STGG)] further influence these profiles. Kit-QS better represented hard-to-lyse bacteria from bacterial mock communities compared to Kit-ZB. Primestore storage buffer yielded lower levels of background operational taxonomic units (OTUs) from low biomass bacterial mock community controls compared to STGG. In addition to bacterial mock community controls, we used technical repeats (nasopharyngeal and induced sputum processed in duplicate, triplicate or quadruplicate) to further evaluate the effect of specimen biomass and participant age at specimen collection on resultant sequencing profiles. We observed a positive correlation (r = 0.16) between specimen biomass and participant age at specimen collection: low biomass technical repeats (represented by < 500 16S rRNA gene copies/μl) were primarily collected at < 14 days of age. We found that low biomass technical repeats also produced higher alpha diversities (r = − 0.28); 16S rRNA gene profiles similar to no template controls (Primestore); and reduced sequencing reproducibility. Finally, we show that the use of statistical tools for in silico contaminant identification, as implemented through the decontam package in R, provides better representations of indigenous bacteria following decontamination. Conclusions We provide insight into experimental design, quality control steps and “denoising” approaches for 16S rRNA gene high-throughput sequencing of low biomass specimens. We highlight the need for careful assessment of DNA extraction methods and storage buffers; sequence quality and reproducibility; and in silico identification of contaminant profiles in order to avoid spurious results

    Multivariate data analysis identifies natural clusters of Tuberous Sclerosis Complex Associated Neuropsychiatric Disorders (TAND)

    Get PDF
    Background Tuberous Sclerosis Complex (TSC), a multi-system genetic disorder, is associated with a wide range of TSC-Associated Neuropsychiatric Disorders (TAND). Individuals have apparently unique TAND profiles, challenging diagnosis, psycho-education, and intervention planning. We proposed that identification of natural TAND clusters could lead to personalized identification and treatment of TAND. Two small-scale studies showed cluster and factor analysis could identify clinically meaningful natural TAND clusters. Here we set out to identify definitive natural TAND clusters in a large, international dataset. Method Cross-sectional, anonymized TAND Checklist data of 453 individuals with TSC were collected from six international sites. Data-driven methods were used to identify natural TAND clusters. Mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Statistical robustness of clusters was evaluated with 1000-fold bootstrapping, and internal consistency calculated with Cronbach’s alpha. Results Ward’s method rendered seven natural TAND clusters with good robustness on bootstrapping. Cluster analysis showed significant convergence with an exploratory factor analysis solution, and, with the exception of one cluster, internal consistency of the emerging clusters was good to excellent. Clusters showed good clinical face validity. Conclusions Our findings identified a data-driven set of natural TAND clusters from within highly variable TAND Checklist data. The seven natural TAND clusters could be used to train families and professionals and to develop tailored approaches to identification and treatment of TAND. Natural TAND clusters may also have differential aetiological underpinnings and responses to molecular and other treatments

    Longitudinal Population Dynamics of Staphylococcus aureus in the Nasopharynx During the First Year of Life

    Get PDF
    Background:Staphylococcus aureus colonization is a risk factor for invasive disease. Few studies have used strain genotype data to study S. aureus acquisition and carriage patterns. We investigated S. aureus nasopharyngeal carriage in infants in an intensively sampled South African birth cohort.Methods: Nasopharyngeal swabs were collected at birth and fortnightly from 137 infants through their first year of life. S. aureus was characterized by spa-typing. The incidence of S. aureus acquisition, and median carriage duration for each genotype was determined. S. aureus carriage patterns were defined by combining the carrier index (proportion of samples testing positive for S. aureus) with genotype diversity measures. Persistent or prolonged carriage were defined by a carrier index ≥0.8 or ≥0.5, respectively. Risk factors for time to acquisition of S. aureus were determined.Results: Eighty eight percent (121/137) of infants acquired S. aureus at least once. The incidence of acquisition at the species and genotype level was 1.83 and 2.8 episodes per child-year, respectively. No children had persistent carriage (defined as carrier index of &gt;0.8). At the species level 6% had prolonged carriage, while only 2% had prolonged carriage with the same genotype. Carrier index correlated with the absolute number of spa-CCs carried by each infant (r = 0.5; 95% CI 0.35–0.62). Time to first acquisition of S. aureus was shorter in children from households with ≥5 individuals (HR 1.06, 95% CI 1.07–1.43), with S. aureus carrier mothers (HR; 1.5, 95% CI 1.2–2.47), or with a positive tuberculin skin test during the first year of life (HR; 1.81, 95% CI 0.97–3.3).Conclusion: Using measures of genotype diversity, we showed that S. aureus NP carriage is highly dynamic in infants. Prolonged carriage with a single strain occurred rarely; persistent carriage was not observed. A correlation was observed between carrier index and genotype diversity
    corecore